snappy: optimize UnalignedCopy64 and IncrementalCopy for RISC-V#3360
snappy: optimize UnalignedCopy64 and IncrementalCopy for RISC-V#3360Felix-Gong wants to merge 2 commits into
Conversation
c9dbd91 to
25a5434
Compare
Use RISC-V inline assembly (ld/sd) for 8-byte copy operations instead of generic macro-based implementation. Changes: - UnalignedCopy64: direct ld/sd pair for 8-byte copy - IncrementalCopy: 8-byte bulk copies when source/dest don't overlap Performance improvement (direct function benchmark): - Decompress compressible-256K: 728 MB/s -> 2205 MB/s (+203%) - Decompress zeros-256K: 543 MB/s -> 1462 MB/s (+169%) Tests: brpc_snappy_compress_unittest passed (7/7) Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
25a5434 to
83cd75a
Compare
|
LGTM |
There was a problem hiding this comment.
Pull request overview
This PR aims to improve Snappy decompression performance on RV64 by accelerating 8-byte copy paths in UnalignedCopy64() and IncrementalCopy() using inline RISC-V ld/sd.
Changes:
- Add an RV64-specific 8-byte bulk-copy loop to
IncrementalCopy()when overlap is considered safe. - Add an RV64-specific
ld/sdimplementation forUnalignedCopy64().
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/butil/third_party/snappy/snappy.cc | Adds RV64 inline-asm 8-byte copying to IncrementalCopy() for non-overlapping (or “safe overlap”) cases. |
| src/butil/third_party/snappy/snappy-stubs-internal.h | Adds RV64 inline-asm ld/sd fast path to UnalignedCopy64(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // RISC-V optimized: use 8-byte copies when possible | ||
| if (len >= 8 && (op - src >= 8 || src - op >= 8)) { | ||
| // Non-overlapping or safe overlap: copy 8 bytes at a time | ||
| do { | ||
| uint64_t tmp; | ||
| __asm__ volatile( | ||
| "ld %0, %1\n\t" | ||
| "sd %0, %2\n\t" | ||
| : "=&r"(tmp) | ||
| : "m"(*(const uint64_t*)src), "m"(*(uint64_t*)op) | ||
| : "memory"); | ||
| src += 8; | ||
| op += 8; | ||
| len -= 8; | ||
| } while (len >= 8); | ||
| } |
There was a problem hiding this comment.
Fixed. Added runtime alignment check with ((uintptr_t)src | (uintptr_t)op) & 7) == 0. Only uses ld/sd when both pointers are 8-byte aligned, otherwise falls back to byte-by-byte copy.
| #if defined(__riscv) && __riscv_xlen == 64 | ||
| // RISC-V optimized: single ld/sd pair for 8-byte copy | ||
| uint64_t tmp; | ||
| __asm__ volatile( | ||
| "ld %0, %1\n\t" | ||
| "sd %0, %2\n\t" | ||
| : "=&r"(tmp) | ||
| : "m"(*(const uint64_t*)src), "m"(*(uint64_t*)dst) | ||
| : "memory"); | ||
| #else |
There was a problem hiding this comment.
Fixed. Added alignment check (((uintptr_t)src | (uintptr_t)dst) & 7) == 0 before using ld/sd. Falls back to memcpy for unaligned addresses.
Address Copilot review: ld/sd require 8-byte alignment. Add runtime alignment check and fall back to memcpy/byte-copy for unaligned addresses to avoid traps on implementations that don't support misaligned access. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
Summary
UnalignedCopy64()with directld/sdinline assembly for 8-byte copyIncrementalCopy()with 8-byte bulk copies when source/dest don't overlapPerformance Improvement (direct function benchmark)
Changes
src/butil/third_party/snappy/snappy-stubs-internal.h: +11 linessrc/butil/third_party/snappy/snappy.cc: +24 linesTest Plan